Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU support #45

Merged
merged 22 commits into from
Mar 20, 2024
Merged

CPU support #45

merged 22 commits into from
Mar 20, 2024

Conversation

pierotofy
Copy link
Owner

@pierotofy pierotofy commented Mar 20, 2024

Adds CPU support, which allows to generate splats without any GPUs (if you have time to wait).

Currently it runs about ~100x slower than CUDA, but there's probably room for improvement in performance. In particular, there might be ways to parallelize some calculations in the renderer, as well as evaluating whether the project_forward and compute_sh_forward functions could benefit from a manual C++ implementation (without relying on libtorch, which conveniently gives us the backward passes for free).

The results should be numerically equivalent to the CUDA implementation (although some numerical precision differences will be present).

Closes #5

Try it by passing the --cpu flag to ./opensplat

@pierotofy pierotofy added the enhancement New feature or request label Mar 20, 2024
@pierotofy pierotofy merged commit 5b06052 into main Mar 20, 2024
22 checks passed
@pfxuan
Copy link
Collaborator

pfxuan commented Mar 20, 2024

This feature could become very useful in near future. 100x slower is actually not too slow. With an extra CPU based optimizations like Intel AVX-512 and AMX, I think it should be feasible to give another 10x performance boost.

@pierotofy
Copy link
Owner Author

Agree! I was actually quite pleased with the performance (and it's a first implementation, so lots of room for improvements).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPU support
2 participants